Take-home Exercise 3

To explain factors affecting the resale prices of public housing in Singapore.

Wong Wei Ling www.google.com
11-06-2021

1 Objectives

To build hedonic pricing models to explain factors affecting the resale prices of public housing in Singapore. The hedonic price models must be built by using appropriate GWR methods. This study focuses on four-room flat and transaction period from 1st January 2019 to 30th September 2020.

2 Dataset

3 Install and Load Packages

packages = c('tidyverse', 'sf', 'spdep', 'tmap', 'ggpubr',
             'corrplot', 'units', 'olsrr', 'plyr',
             'matrixStats', 'GWmodel', 'httr')

for (p in packages){
if(!require(p, character.only = T)){
  install.packages(p)
}
  library(p,character.only = T)
}

Explanation on the uses of each package:

4 Data Import and Preparation for Geospatial Dataset

In this section, we have to import and ensure that the geospatial data is in a format that we can use to perform analysis. We have to check if it is being assigned the right CRS and code. Also, since we are working in the boundary of Singapore, we have to ensure that it is in SVY21 with the EPSG code of 3414 being assigned as well.

The datasets that we will prepare in this section are:

4.1 Import geospatial data set

mpsz_sf <- st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\wlwong2018\IS415 Blog\_posts\2021-11-06-take-home-exercise-3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
mrt_sf <- st_read(dsn = "data/geospatial", layer = "MRTLRTStnPtt")
Reading layer `MRTLRTStnPtt' from data source 
  `C:\wlwong2018\IS415 Blog\_posts\2021-11-06-take-home-exercise-3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 185 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21
bus_sf <- st_read(dsn = "data/geospatial", layer = "BusStop")
Reading layer `BusStop' from data source 
  `C:\wlwong2018\IS415 Blog\_posts\2021-11-06-take-home-exercise-3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5156 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4427.938 ymin: 26482.1 xmax: 48282.5 ymax: 52983.82
Projected CRS: SVY21

4.2 Look into datasets

glimpse(mpsz_sf)
Rows: 323
Columns: 16
$ OBJECTID   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15~
$ SUBZONE_NO <int> 1, 1, 3, 8, 3, 7, 9, 2, 13, 7, 12, 6, 1, 5, 1, 1,~
$ SUBZONE_N  <chr> "MARINA SOUTH", "PEARL'S HILL", "BOAT QUAY", "HEN~
$ SUBZONE_C  <chr> "MSSZ01", "OTSZ01", "SRSZ03", "BMSZ08", "BMSZ03",~
$ CA_IND     <chr> "Y", "Y", "Y", "N", "N", "N", "N", "Y", "N", "N",~
$ PLN_AREA_N <chr> "MARINA SOUTH", "OUTRAM", "SINGAPORE RIVER", "BUK~
$ PLN_AREA_C <chr> "MS", "OT", "SR", "BM", "BM", "BM", "BM", "SR", "~
$ REGION_N   <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGI~
$ REGION_C   <chr> "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "~
$ INC_CRC    <chr> "5ED7EB253F99252E", "8C7149B9EB32EEFC", "C35FEFF0~
$ FMEL_UPD_D <date> 2014-12-05, 2014-12-05, 2014-12-05, 2014-12-05, ~
$ X_ADDR     <dbl> 31595.84, 28679.06, 29654.96, 26782.83, 26201.96,~
$ Y_ADDR     <dbl> 29220.19, 29782.05, 29974.66, 29933.77, 30005.70,~
$ SHAPE_Leng <dbl> 5267.381, 3506.107, 1740.926, 3313.625, 2825.594,~
$ SHAPE_Area <dbl> 1630379.3, 559816.2, 160807.5, 595428.9, 387429.4~
$ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((31495.56 30..., MULT~
glimpse(mrt_sf)
Rows: 185
Columns: 4
$ OBJECTID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ~
$ STN_NAME <chr> "EUNOS MRT STATION", "CHINESE GARDEN MRT STATION", ~
$ STN_NO   <chr> "EW7", "EW25", "NS14", "NS7", "EW18", "NS5", "EW28"~
$ geometry <POINT [m]> POINT (35782.96 33560.08), POINT (16790.75 36~
glimpse(bus_sf)
Rows: 5,156
Columns: 4
$ BUS_STOP_N <chr> "78221", "63359", "64141", "83139", "55231", "553~
$ BUS_ROOF_N <chr> "B06", "B01", "B13", "B07", "B02", "B03", "B10", ~
$ LOC_DESC   <chr> "BLK 231A CP", "HOUGANG SWIM CPLX", "AFT JLN TELA~
$ geometry   <POINT [m]> POINT (42227.96 39563.16), POINT (34065.75 ~

Results above show that:

4.3 Check for NA values

mpsz_sf[rowSums(is.na(mpsz_sf))!=0,]
Simple feature collection with 0 features and 15 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
 [1] OBJECTID   SUBZONE_NO SUBZONE_N  SUBZONE_C  CA_IND     PLN_AREA_N
 [7] PLN_AREA_C REGION_N   REGION_C   INC_CRC    FMEL_UPD_D X_ADDR    
[13] Y_ADDR     SHAPE_Leng SHAPE_Area geometry  
<0 rows> (or 0-length row.names)
mrt_sf[rowSums(is.na(mrt_sf))!=0,]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO   geometry
<0 rows> (or 0-length row.names)
bus_sf[rowSums(is.na(bus_sf))!=0,]
Simple feature collection with 1 feature and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 22616.75 ymin: 47793.68 xmax: 22616.75 ymax: 47793.68
Projected CRS: SVY21
    BUS_STOP_N BUS_ROOF_N LOC_DESC                  geometry
264      47201        UNK     <NA> POINT (22616.75 47793.68)

Results above show that:

4.4 Remove NA values

bus_sf <- bus_sf[!rowSums(is.na(bus_sf))!=0,]

4.5 Check for NA values

mpsz_sf[rowSums(is.na(mpsz_sf))!=0,]
Simple feature collection with 0 features and 15 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
 [1] OBJECTID   SUBZONE_NO SUBZONE_N  SUBZONE_C  CA_IND     PLN_AREA_N
 [7] PLN_AREA_C REGION_N   REGION_C   INC_CRC    FMEL_UPD_D X_ADDR    
[13] Y_ADDR     SHAPE_Leng SHAPE_Area geometry  
<0 rows> (or 0-length row.names)
mrt_sf[rowSums(is.na(mrt_sf))!=0,]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO   geometry
<0 rows> (or 0-length row.names)
bus_sf[rowSums(is.na(bus_sf))!=0,]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC   geometry  
<0 rows> (or 0-length row.names)

Results above show that:

4.6 Check for duplicate values

mrt_sf[duplicated(mrt_sf$STN_NAME),]
Simple feature collection with 19 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 22949.03 ymin: 28735.78 xmax: 42362.71 ymax: 46418.56
Projected CRS: SVY21
First 10 features:
    OBJECTID                STN_NAME STN_NO                  geometry
33        33  ANG MO KIO MRT STATION   NS16 POINT (29807.27 39105.77)
97       105  MACPHERSON MRT STATION   DT26  POINT (34235.8 34256.43)
103      111       BUGIS MRT STATION   EW12 POINT (30491.56 31424.35)
114      122        EXPO MRT STATION   DT35 POINT (42362.71 35285.67)
116      124 BUONA VISTA MRT STATION   CC22 POINT (23251.76 32090.77)
121      129  MARINA BAY MRT STATION    CE2 POINT (30423.43 28735.78)
124      132   CHINATOWN MRT STATION   DT19 POINT (29238.97 29686.54)
134      142    TAMPINES MRT STATION   DT32 POINT (40213.03 37463.37)
140      148   SERANGOON MRT STATION   NE12 POINT (32480.07 36869.39)
144      152  PAYA LEBAR MRT STATION    CC9 POINT (34550.58 33300.34)
bus_sf[duplicated(bus_sf$BUS_STOP_N),]
Simple feature collection with 13 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 13488.02 ymin: 29969.22 xmax: 44144.57 ymax: 47806.75
Projected CRS: SVY21
First 10 features:
     BUS_STOP_N BUS_ROOF_N             LOC_DESC
350       58031        UNK      OPP CANBERRA DR
1569      51071        B21 MACRITCHIE RESERVOIR
2208      82221        B01               Blk 3A
2215      97079        B14  OPP ST. JOHN'S CRES
2392      62251        B03         BEF BLK 471B
2462      22501        B02             BLK 662A
2736      16119        NIL        TELETECH PARK
2976      58229       B01A          BLK 358A CP
3442      67421        NIL CHENG LIM STN EXIT B
3521      68091        B08         AFT BAKER ST
                      geometry
350   POINT (27089.69 47570.9)
1569 POINT (28305.37 36036.67)
2208 POINT (35308.74 33335.17)
2215 POINT (44144.57 38980.25)
2392 POINT (35500.36 39943.34)
2462 POINT (13488.02 35537.88)
2736 POINT (22318.89 29969.22)
2976 POINT (26094.86 47806.75)
3442 POINT (34741.77 42004.21)
3521 POINT (32038.84 43298.68)

Results above show that:

4.7 Remove duplicated values

mrt_sf <- mrt_sf[!duplicated(mrt_sf$STN_NAME),]
bus_sf <- bus_sf[!duplicated(bus_sf$BUS_STOP_N),]

4.8 Check for duplicated values

mrt_sf[duplicated(mrt_sf$STN_NAME),]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO   geometry
<0 rows> (or 0-length row.names)
bus_sf[duplicated(bus_sf$BUS_STOP_N),]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC   geometry  
<0 rows> (or 0-length row.names)

Results above show that:

4.9 Check CRS

st_crs(mpsz_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(mrt_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(bus_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

From the results above:

4.10 Assign CRS and check again

mpsz_sf <- st_set_crs(mpsz_sf, 3414)
mrt_sf <- st_set_crs(mrt_sf, 3414)
bus_sf <- st_set_crs(bus_sf, 3414)

st_crs(mpsz_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(mrt_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(bus_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Results above show that:

4.11 Check for invalid geometries

length(which(st_is_valid(mpsz_sf) == FALSE))
[1] 9
length(which(st_is_valid(mrt_sf) == FALSE))
[1] 0
length(which(st_is_valid(bus_sf) == FALSE))
[1] 0

Results above show that:

4.12 Handle the invalid geometries and check again

mpsz_sf <- st_make_valid(mpsz_sf)

length(which(st_is_valid(mpsz_sf) == FALSE))
[1] 0
length(which(st_is_valid(mrt_sf) == FALSE))
[1] 0
length(which(st_is_valid(bus_sf) == FALSE))
[1] 0

Results above show that:

4.13 Plot geospatial data

tmap_mode('view')
tm_shape(mpsz_sf) +
  tm_polygons() +
tm_shape(bus_sf) +
  tm_dots(alpha = 0.4,
          col = "blue",
          size = 0.03)
tmap_mode('plot')

Results above show that:

4.14 Remove bus stops outside of Singapore boundary

4.14.1 Inspect names of bus stop

remove_busstop <- list(46211, 46219, 46239, 46609, 47701)
bus_sf[bus_sf$BUS_STOP_N %in% remove_busstop, ]
Simple feature collection with 5 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 17969.14 ymin: 49173.42 xmax: 20723.24 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
     BUS_STOP_N BUS_ROOF_N            LOC_DESC
765       47701        NIL          JB SENTRAL
1532      46239         NA          LARKIN TER
2257      46609         NA     KOTARAYA II TER
2269      46211        NIL JOHOR BAHRU CHECKPT
4346      46219        NIL JOHOR BAHRU CHECKPT
                      geometry
765  POINT (20370.54 49173.42)
1532 POINT (17969.14 52983.82)
2257  POINT (20025.46 49261.6)
2269 POINT (20623.18 49199.99)
4346 POINT (20723.24 49200.05)

Results above show that:

4.14.2 Remove bus stops from bus stop data

bus_sf <- bus_sf[!bus_sf$BUS_STOP_N %in% remove_busstop, ]

4.14.3 Check if bus stops are removed

bus_sf[bus_sf$BUS_STOP_N %in% remove_busstop, ]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC   geometry  
<0 rows> (or 0-length row.names)

Results above show that:

5 Data Import and Preparation of Aspatial Dataset

In this section, we will prepare the aspatial data sets which will be our independent variables for calibrating the hedonic pricing models in the later section.

Note: This section will not be ran to save computational time. Documentation on the *results are still written even though the output is not shown.

The datasets that we will prepare in this section are:

5.1 Import aspatial data sets

resale <- read_csv("data/aspatial/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")
eldercare <- read_csv("data/aspatial/eldercare.csv")
hawker <- read_csv("data/aspatial/hawkercentre.csv")
park <- read_csv("data/aspatial/park.csv")
prisch <- read_csv("data/aspatial/general-information-of-schools.csv")
mall <- read_csv("data/aspatial/mall.csv")
supermarket <- read_csv("data/aspatial/supermarkets.csv")
kindergarten <- read_csv("data/aspatial/kindergarten.csv")
childcare <- read_csv("data/aspatial/childcare.csv")

5.2 Look into datasets

glimpse(resale)
glimpse(eldercare)
glimpse(hawker)
glimpse(park)
glimpse(prisch)
glimpse(mall)
glimpse(supermarket)
glimpse(kindergarten)
glimpse(childcare)

Results above show that:

5.3 Extract and filter necessary columns and data

resale <- resale %>%
  filter(flat_type == "4 ROOM",
         month >= "2019-01" & month <= "2020-09")

eldercare <- eldercare %>%
  select(NAME, Lng, Lat)

hawker <- hawker %>%
  select(NAME, Lng, Lat)

park <- park %>%
  select(NAME, Lng, Lat)

prisch <- prisch %>%
  filter(mainlevel_code == "PRIMARY") %>%
  select(school_name, address, gifted_ind)
  
supermarket <- supermarket %>%
  select(NAME, Lng, Lat)

kindergarten <- kindergarten %>%
  select(NAME, Lng, Lat)

childcare <- childcare %>%
  select(NAME, Lng, Lat)

5.4 Look into datasets again

glimpse(resale)
glimpse(eldercare)
glimpse(hawker)
glimpse(park)
glimpse(prisch)
glimpse(mall)
glimpse(supermarket)
glimpse(kindergarten)
glimpse(childcare)
# glimpse(bus)

Results above show that:

5.5 Check for NA values

resale[rowSums(is.na(resale))!=0,]
eldercare[rowSums(is.na(eldercare))!=0,]
hawker[rowSums(is.na(hawker))!=0,]
park[rowSums(is.na(park))!=0,]
prisch[rowSums(is.na(prisch))!=0,]
mall[rowSums(is.na(mall))!=0,]
supermarket[rowSums(is.na(supermarket))!=0,]
kindergarten[rowSums(is.na(kindergarten))!=0,]
childcare[rowSums(is.na(childcare))!=0,]

Results above show that:

5.6 Prepare resale Dataset

resale dataset has a section on its own as there are different preparation steps to be done:

5.6.1 Rename value in column

resale$street_name <- gsub("ST\\.", "SAINT", resale$street_name)

5.6.2 Combine columns for geo-coding

resale <- resale %>%
  mutate(`address` = paste(`block`, `street_name`, sep=' '))

5.6.3 Onemap API for geo-coding

5.6.3.1 Function to geo code

This function does the following:

library(httr)

geo_code <- function(address) {
  url <- "https://developers.onemap.sg/commonapi/search"
  query <- list("searchVal" = address, "returnGeom" = "Y", "getAddrDetails" = "N") #, "pageNum" = "1"
  res <- GET(url, query = query, verbose())

  output <- content(res) %>%
    as.data.frame() %>%
    select(results.X, results.Y, results.LONGITUDE, results.LATITUDE)

  return(output)
}

5.6.3.2 For loop to apply function (row 1 to 1000)

This code chunk:

resale$X <- 0
resale$Y <- 0
resale$LONGITUDE <- 0 
resale$LATITUDE <- 0 

for (i in 1:1000) {
  output <- geo_code(resale$address[i])

  resale$X[i] <- output$results.X
  resale$Y[i] <- output$results.Y
  resale$LONGITUDE[i] <- output$results.LONGITUDE
  resale$LATITUDE[i] <- output$results.LATITUDE
}

5.6.3.3 For loop to apply function (row 1001 and beyond)

for (i in 15001:15901) {
  output <- geo_code(resale$address[i])
  
  if (resale$X[i] == 0) {
    resale$X[i] <- output$results.X
    resale$Y[i] <- output$results.Y
    resale$LONGITUDE[i] <- output$results.LONGITUDE
    resale$LATITUDE[i] <- output$results.LATITUDE
  }
}

5.6.3.4 Write into .csv file

write_csv(resale, "data/aspatial/final_resale_prices_XY_LongLat.csv")

5.6.4 Import .csv file

resale_new <- read_csv("data/aspatial/final_resale_prices_XY_LongLat.csv")
glimpse(resale_new)

Results above show that:

5.6.5 Extract necessary columns

resale_new <- resale_new %>%
  select(month, town, flat_type, storey_range, floor_area_sqm, remaining_lease, resale_price, address, LONGITUDE, LATITUDE)

5.6.6 Manipulate storey_range column

resale_new <- resale_new %>%
  mutate(yesno = 1) %>%
  distinct %>%
  pivot_wider(names_from = storey_range, values_from = yesno, values_fill = 0)

5.6.7 Manipulate remaining_lease column

resale_new$remaining_lease_new <- gsub("years", "", paste(resale_new$remaining_lease))
resale_new$remaining_lease_new <- gsub("month", "", paste(resale_new$remaining_lease_new))
resale_new$remaining_lease_new <- gsub("s", "", paste(resale_new$remaining_lease_new))

resale_new$remaining_lease_yr <- substr(resale_new$remaining_lease_new, start = 1, stop = 2)
resale_new$remaining_lease_mth <- substring(resale_new$remaining_lease_new, 5, 6)

resale_new$remaining_lease_yr <- as.double(resale_new$remaining_lease_yr)
resale_new$remaining_lease_mth <- as.double(resale_new$remaining_lease_mth)


resale_new$remaining_lease_mth[is.na(resale_new$remaining_lease_mth)] <- 0

resale_new <- resale_new %>%
  mutate(`remaining_lease_final` = `remaining_lease_yr` + round((`remaining_lease_mth`/12),2))

drop <- c("remaining_lease_new", "remaining_lease_yr", "remaining_lease_mth")
resale_new <- resale_new[, !names(resale_new) %in% drop]

5.7 Prepare prisch (primary school) Dataset

prisch dataset has a section on its own as there are different preparation steps to be done:

5.7.1 Apply geo_code function

This code chunk:

prisch$LONGITUDE <- 0 
prisch$LATITUDE <- 0 

count <- 0
failed_count <- 0

for (i in 1:183){
  tryCatch( {
    output <- geo_code(prisch$address[i])
    count <- count + 1
    prisch$LONGITUDE[i] <- output$results.LONGITUDE
    prisch$LATITUDE[i] <- output$results.LATITUDE

  }, error = function(err) {
      count <- count + 1
      failed_count <- failed_count + 1
      cat('Failed to extract',count,'out of',length(prisch$address),'addresses')
    }
  )
}

5.7.2 Write output into a CSV file

write_csv(prisch, "data/aspatial/prisch.csv")

5.7.3 Import prisch Dataset

prisch <- read_csv("data/aspatial/prisch.csv")
glimpse(prisch)

Results above show that:

5.7.4 Verify that all primary schools have Latitude & Longitude

5.7.4.1 Find primary schools with missing Latitude & Longitude

prisch$school_name[prisch$LONGITUDE==0.0000]

Results above show that:

5.7.4.2 Assign Latitude & Longitude

prisch$LONGITUDE[prisch$school_name == "HOUGANG PRIMARY SCHOOL"] <- 103.881072
prisch$LATITUDE[prisch$school_name == "HOUGANG PRIMARY SCHOOL"] <- 1.3783581

prisch$LONGITUDE[prisch$school_name == "JING SHAN PRIMARY SCHOOL"] <- 103.8520152
prisch$LATITUDE[prisch$school_name == "JING SHAN PRIMARY SCHOOL"] <- 1.3722578

prisch$LONGITUDE[prisch$school_name == "WEST GROVE PRIMARY SCHOOL"] <- 103.6989833
prisch$LATITUDE[prisch$school_name == "WEST GROVE PRIMARY SCHOOL"] <- 1.3446809

prisch$LONGITUDE[prisch$school_name == "HWHITE SANDS PRIMARY SCHOOL"] <- 103.9612546
prisch$LATITUDE[prisch$school_name == "WHITE SANDS PRIMARY SCHOOL"] <- 1.365473

5.7.5 Filter good primary school

gd_prisch <- prisch %>%
  filter(gifted_ind == "Yes")

Results above show that:

6 Geospatial Wrangling

Note: This section will not be ran except for Section 6.5 and Section 6.6 to save computational time. Documentation on the results are still written even though the output is not shown.

The variables that we will prepare in this section are:

Locational factors

We will then combine the above variables with resale dataset which consists of the prepared Structural factors such as:

6.1 Convert Aspatial Dataframe into a sf object

6.1.1 Function to convert aspatial dataframe into a sf object

This function does the following:

convert_sf <- function(df, x, y) {
  st_as_sf(df,
            coords = c(x, y),
            crs=4326) %>%
  st_transform(crs=3414)
}

6.1.2 Apply function to convert aspatial dataframes into a sf object

resale_sf <- convert_sf(resale_new, "LONGITUDE", "LATITUDE")
eldercare_sf <- convert_sf(eldercare, "Lng", "Lat")
hawker_sf <- convert_sf(hawker, "Lng", "Lat")
park_sf <- convert_sf(park, "Lng", "Lat")
gd_prisch_sf <- convert_sf(gd_prisch, "LONGITUDE", "LATITUDE")
mall_sf <- convert_sf(mall, "longitude", "latitude")
supermarket_sf <- convert_sf(supermarket, "Lng", "Lat")
kindergarten_sf <- convert_sf(kindergarten, "Lng", "Lat")
childcare_sf <- convert_sf(childcare, "Lng", "Lat")
prisch_sf <- convert_sf(prisch, "LONGITUDE", "LATITUDE")

6.2 Prepare Independent Variables

6.2.1 Central Business District (CBD) of Singapore

longitude <- 103.851784
latitude <- 1.287953

cbd_coorinates_sf <- data.frame(longitude, latitude) %>%
  st_as_sf(., coords = c("longitude", "latitude"), crs=4326) %>%
  st_transform(crs=3414)

6.2.2 Function to calculate proximity of variables

This function does the following:

calulate_prox <- function(df1, df2, var) {
  distance_matrix <- st_distance(df1, df2) %>%
    drop_units()
  df1[,var] <- rowMins(distance_matrix)

  return(df1)
}

6.2.3 Apply function to calculate proximity

resale_sf <- calulate_prox(resale_sf, cbd_coorinates_sf, "cbd_prox") %>%
              calulate_prox(., eldercare_sf, "eldercare_prox") %>%
              calulate_prox(., hawker_sf, "hawker_prox") %>% 
              calulate_prox(., mrt_sf, "mrt_prox") %>%
              calulate_prox(., park_sf, "park_prox") %>%
              calulate_prox(., gd_prisch_sf, "gd_prisch_prox") %>%
              calulate_prox(., mall_sf, "mall_prox") %>%
              calulate_prox(., supermarket_sf, "supermarket_prox") 

6.2.4 Function to calculate number of variables within a certain distance

This function does the following:

calculate_num <- function(df1, df2, distance, var) {
  distance_matrix <- st_distance(df1, df2) %>% 
    drop_units()
  distance_matrix <- data.frame(distance_matrix)
  df1[,var] <- rowSums(distance_matrix <= distance)
  
  return(df1)
}

6.2.5 Apply function to calculate number of variables within a certain distance

resale_sf <- calculate_num(resale_sf, kindergarten_sf, 350, "kindergarten_num") %>%
              calculate_num(., childcare_sf, 350, "childcare_num") %>%
              calculate_num(., bus_sf, 350, "busstop_num") %>%
              calculate_num(., prisch_sf, 1000, "prisch_num")

6.3 Re-arrange columns in resale_sf

col_order <- c("month", "town", "flat_type", "remaining_lease", "address",
              "resale_price", "floor_area_sqm", "remaining_lease_final", 
              "01 TO 03", "04 TO 06", "07 TO 09", "10 TO 12", "13 TO 15", "16 TO 18",
              "19 TO 21", "22 TO 24", "25 TO 27", "28 TO 30", "31 TO 33", "34 TO 36",
              "37 TO 39", "40 TO 42", "43 TO 45", "46 TO 48", "49 TO 51",
              'cbd_prox', "eldercare_prox", "hawker_prox", "mrt_prox", "park_prox",
               "gd_prisch_prox", "mall_prox", "supermarket_prox",
              "kindergarten_num", "childcare_num", "busstop_num", "prisch_num", "geometry")
resale_sf <- resale_sf[, col_order]

glimpse(resale_sf)

6.4 Write to a shp file

st_write(resale_sf, "data/geospatial/final_resale.shp")

6.5 Import shp file

resale_sf <- st_read(dsn="data/geospatial", layer="final_resale")
Reading layer `final_resale' from data source 
  `C:\wlwong2018\IS415 Blog\_posts\2021-11-06-take-home-exercise-3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 15853 features and 37 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 11597.31 ymin: 28217.39 xmax: 42623.63 ymax: 48741.06
Projected CRS: SVY21 / Singapore TM

6.6 Rename columns

resale_sf <- resale_sf %>%
  dplyr::rename(resale_price = rsl_prc, floor_area_sqm = flr_r_s, remaining_lease_final = rmnng__,
         lvl_01_TO_03 = X01TO03, lvl_04_TO_06 = X04TO06, lvl_07_TO_09 = X07TO09,
         lvl_10_TO_12 = X10TO12, lvl_13_TO_15 = X13TO15, lvl_16_TO_18 = X16TO18,
         lvl_19_TO_21 = X19TO21, lvl_22_TO_24 = X22TO24, lvl_25_TO_27 = X25TO27,
         lvl_28_TO_30 = X28TO30, lvl_31_TO_33 = X31TO33, lvl_34_TO_36 = X34TO36,
         lvl_37_TO_39 = X37TO39, lvl_40_TO_42 = X40TO42, lvl_43_TO_45 = X43TO45,
         lvl_46_TO_48 = X46TO48, lvl_49_TO_51 = X49TO51,
         cbd_prox = cbd_prx, eldercare_prox = eldrcr_, hawker_prox = hwkr_pr,
         mrt_prox = mrt_prx, park_prox = prk_prx, gd_prisch_prox = gd_prs_,
         mall_prox = mll_prx, supermarket_prox = sprmrk_, kindergarten_num  = kndrgr_,
         childcare_num  = chldcr_, busstop_num  = bsstp_n, prisch_num  = prsch_n)

7 Exploratory Data Analysis

In this section, we will perform some Exploratory Data Analysis to understand our dataset.

7.1 EDA using statistical graphics

7.1.1 Function to Plot Histogram

mul_hist <- function(df, vnam, x_axis) {
  ggplot(data=df, aes(x=vnam)) +
  geom_histogram(bins=20, color="black", fill="salmon") +
        scale_x_continuous(name = x_axis)
}

7.1.2 Plot distribution of resale_price

mul_hist(resale_sf, resale_sf$resale_price, "Resale Price")

Results above show that:

7.1.3 Normalise using Log Transformation

resale_sf <- resale_sf %>%
  mutate(`resale_price_log` = log(resale_price))

7.1.4 Plot resale_price_log

mul_hist(resale_sf, resale_sf$resale_price_log, "Resale Price Log")

Results above show that: